Managing Terabyte-Scale Investigations with Similarity Digests

نویسنده

Vassil Roussev

چکیده

The relentless increase in storage capacity and decrease in storage cost present an escalating challenge for digital forensic investigations – current forensic technologies are not designed to scale to the degree necessary to process the ever increasing volumes of digital evidence. This paper describes a similarity-digest-based approach that scales up the task of finding related digital artifacts in massive data sets. The results show that digests can be generated at rates exceeding those of cryptographic hashes on commodity multi-core computing systems. Also, the querying of the digest of a large (1 TB) target for the (trace) presence of a small file can be completed in less than one second with very high precision and recall rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Content triage with similarity digests: The M57 case study

In this work we illustrate the use of similarity digests for the purposes of forensic triage. We use a case that consists of 1.5 TB of raw data, including disk images, network captures, RAM snapshots, and USB flash media. We demonstrate that by applying similarity digests in a systematic manner, the scope of examination can be narrowed down within a matter of minutes to hours. In contrast, conv...

متن کامل

Data Fingerprinting with Similarity Digests

State-of-the-art techniques for data fingerprinting have been based on randomized feature selection pioneered by Rabin in 1981. This paper proposes a new, statistical approach for selecting fingerprinting features. The approach relies on entropy estimates and a sizeable empirical study to pick out the features that are most likely to be unique to a data object and, therefore, least likely to tr...

متن کامل

Scalable Data Correlation

The fast capacity growth of cheap storage presents an ever-escalating problem for forensic investigations as currently employed forensic technologies are not designed to scale to the degree necessary to meet the challenge. In this work, we present an approach which seeks to scale up the process of finding related digital artifacts across large data sets by employing an advanced version of our s...

متن کامل

Managing Natural Language Requirements in Large-Scale Software Development

An increasing number of marketand technology-driven software development companies face the challenge of managing several thousands of requirements written in natural language. The large number of requirements causes bottlenecks in the requirements management process and calls for increased efficiency in requirements engineering. This thesis presents results from empirical investigations of usi...

متن کامل

Lessons Learned from Managing a Petabyte

The amount of data collected and stored by the average business doubles each year. Many commercial databases are already approaching hundreds of terabytes, and at this rate, will soon be managing petabytes. More data enables new functionality and capability, but the larger scale reveals new problems and issues hidden in “smaller” terascale environments. This paper presents some of these new pro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Managing Terabyte-Scale Investigations with Similarity Digests

نویسنده

چکیده

منابع مشابه

Content triage with similarity digests: The M57 case study

Data Fingerprinting with Similarity Digests

Scalable Data Correlation

Managing Natural Language Requirements in Large-Scale Software Development

Lessons Learned from Managing a Petabyte

عنوان ژورنال:

اشتراک گذاری